74 research outputs found

    Confidence criterion for speech balloon segmentation

    Get PDF
    International audienceThis short paper investigates how to improve the confidence of speech balloon segmentation algorithms from comic book images. It comes from the need of precise indications about the quality of automatic processing in order to accept or not each segmented regions as a valid result, according to the application and without requiring any ground truth. We discuss several applications like result quality assessment for companies and automatic ground truth creation from high confidence results to train machine learning based systems.We present some ideas to combine several domain knowledge information (e.g. shape, text, etc.) and produce an improved confidence criterion

    Segmentation d'Images Texturées Couleur à l'aide de modèles paramétriques pour approcher la distribution des erreurs de prédiction linéaires

    No full text
    International audienceWe propose novel a priori parametric models to approximate the distribution of the two dimensional multichannel linear prediction error in order to improve the performance of color texture segmentation algorithms. Two dimensional linear prediction models are used to characterize the spatial structures in color images. The multivariate linear prediction error of these texture models is approximated with Wishart distribution and multivariate Gaussian mixture models. A novel color texture segmentation framework based on these models and a spatial regularization model of initial class label fields is presented. For the proposed method and with different color spaces, experimental results show better performances in terms of percentage segmentation error, in comparison with the use of a multivariate Gaussian law.Nous présentons de nouveaux modèles paramétriques pour approcher la distribution des erreurs de prédiction linéaire issues d’un signal multicanal bidimensionnel. Ces modèles sont utilisés afin d’améliorer la performance d’algorithmes de segmentation d’images texturées couleur. Les modèles de prédiction linéaire 2D offrent une caractérisation des structures spatiales des textures couleur. Dans ce papier, la distribution de l’erreur de prédiction linéaire associée à ces modèles est approchée à l’aide de la distribution de Wishart et des lois de mélanges gaussiennes multidimensionnelles. La méthode de segmentation est basée sur ces modèles de distribution et un modèle de régularisation spatiale des régions. Les résultats montrent qu’en termes de pourcentage d’erreur de segmentation, les performances sont améliorées avec la méthode proposée pour les trois espaces couleur testés par rapport à l’utilisation d’une loi de gauss multidimensionnelle

    Real-Time Smile Detection using Deep Learning

    Get PDF
    Real-time smile detection from facial images is useful in many real world applications such as automatic photo capturing in mobile phone cameras or interactive distance learning. In this paper, we study different architectures of object detection deep networks for solving real-time smile detection problem. We then propose a combination of a lightweight convolutional neural network architecture (BKNet) with an efficient object detection framework (RetinaNet). The evaluation on the two datasets (GENKI-4K, UCF Selfie) with a mid-range hardware device (GTX TITAN Black) show that our proposed method helps in improving both accuracy and inference time of the original RetinaNet to reach real-time performance. In comparison with the state-of-the-art object detection framework (YOLO), our method has higher inference time, but still reaches real-time performance and obtains higher accuracy of smile detection on both experimented datasets

    MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis

    Get PDF
    Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. For the presented benchmark dataset baselines are provided for such tasks as document location and identification, text fields recognition, and face detection. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data, and we believe that it will prove invaluable for advancement of the field of document analysis and recognition. The dataset is available for download at ftp://smartengines.com/midv-2020 and http://l3i-share.univ-lr.fr
    corecore